-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-2139: New event storage format #908
RUMM-2139: New event storage format #908
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #908 +/- ##
===========================================
- Coverage 83.06% 83.04% -0.02%
===========================================
Files 266 267 +1
Lines 9068 9047 -21
Branches 1473 1453 -20
===========================================
- Hits 7532 7513 -19
+ Misses 1138 1137 -1
+ Partials 398 397 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good from my POV 👌. Writing interfaces do not depend on data format (prefix/suffix/separator) ☑️ and encryption is vastly simplified ☑️.
...src/main/kotlin/com/datadog/android/core/internal/persistence/file/batch/BatchFileHandler.kt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
// region FileHandler | ||
internal val delegate: FileHandler | ||
) : FileHandler by delegate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice job here!
...src/main/kotlin/com/datadog/android/core/internal/persistence/file/batch/BatchFileHandler.kt
Show resolved
Hide resolved
...src/main/kotlin/com/datadog/android/core/internal/persistence/file/batch/BatchFileHandler.kt
Outdated
Show resolved
Hide resolved
val meta = metaGenerator(data) | ||
|
||
// in Kotlin and Java byte type is signed, meaning it goes from -127 to 128. | ||
// It is completely fine to have size more than 128, it will be just stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure about this ? I am thinking that for example : 129
will become -127
and when you will read it back will not have the required value. Am I missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be fine. It is just One's complement system https://en.wikipedia.org/wiki/Ones%27_complement. When we read byte back as InputStream.read()
it returns actually an int
(it applies 0xff
mask, check here), so values greater than 127 which go to the negative zone in the byte
representation will be positive if represented.
Here is an example https://ideone.com/5GJksf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, was not aware of that, thanks for the explanation.
} else { | ||
result.copyOf(offset) | ||
if (remaining != 0) { | ||
val message = WARNING_NOT_ALL_DATA_READ.format(Locale.US, file.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still send this data in this case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I was thinking about it and decided to keep it to see if this will happen in our nightly tests, just in case.
Not sure if it may happen at all with flash memory (maybe only when the system decides to clean up cache folder while we read the file?), but since we are using the binary format now, it is better to keep an eye on the file integrity to be able to catch possible issues.
|
||
// region private | ||
|
||
private fun List<ByteArray>.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen that you already created Collection<ByteArray>.join
extension for this, this looks much more simpler though, wondering why not having the same implementation also there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the one used in test is not very efficient - it will create new array for every reduce
operation, but it is much more readable indeed (this is an advantage for the test).
The one used in the production code creates array only once and is using copyTo
to fill it.
6b9d129
to
9ac0d9c
Compare
What does this PR do?
This PR brings a new event storage format. Now every event written to the disk will also have a header which contains metadata. Currently this metadata only includes event size in bytes, which makes possible to treat a storage as a binary file and read individual events back for the possible pre-processing before doing the upload, also removing the need to store the separator in the file.
This new format also dramatically improves the logic needed to support encryption, because knowing event binary size we don't need to rely on the separator anymore when reading it back.
Another benefit is that is file is partially corrupted we will be able to read back at least some events, instead of loosing all the data in the file.
While this PR adds metadata to each event, it is not yet propagated back to the higher levels and is using currently only for reading events - this is done to minimize the change scope for this PR.
Reader is now responsible for joining events into a single binary batch.
New event storage structure: 1 byte of header version + 1 byte of meta size + meta bytes + event bytes
Meta structure:
{ "ev_size": number }
, whereev_size
is event size in bytes.This change is not backward compatible with the previous storage format, but it is fine to have it like that - we introduce new folders, so data doesn't overlap between the formats.
Review checklist (to be filled by reviewers)